perm filename NLM.COR[AM,DBL] blob sn#413893 filedate 1979-01-28 generic text, type C, neo UTF8
COMMENT ⊗   VALID 00015 PAGES
C REC  PAGE   DESCRIPTION
C00001 00001
C00002 00002	.indexchar←null
C00003 00003
C00008 00004	.sec Background and Rationale
C00012 00005
C00024 00006	.ss Reasoning, REASONBACK:
C00034 00007
C00043 00008	.sss Discovery and Theory Formation, DISCOBACK:
C00044 00009	.sec Methods of Procedure, METHODSCORE:
C00047 00010	.ss Representation, REPCORE:
C00055 00011	.ss Reasoning, REASONCORE:
C00060 00012	.ss Knowledge Acquisition and Management, KAMCORE:
C00066 00013	.ss Multiple Uses of a Knowledge Base
C00071 00014	.sec Significance
C00074 00015
C00084 ENDMK
C⊗;
.indexchar←null;
.halflinenote←true;
.plainnumbernote←true;
.footsep←"------------------";
.library <pub>newjournal

.underlinebetween (<<,>)
.single space
.every heading (Core Research, {DATE}, 1st draft)
.blankline
.device tty

.next page
.preface 1
.indent 0
.PAGE FRAME 60 HIGH 75 WIDE
.AREA TEXT LINE 4 to 57
.TITLE AREA FOOTING LINE 59
.PLACE TEXT



.begin center
.skip 2
Core Research Proposal
.skip 20

.end

.sec Objectives of Research


The long term goal of artificial intelligence research
at the Heuristic Programming Project (HPP) is to
understand and build intelligent systems.  
Over the past decade we have studied
intelligent systems in the context of 
scientific and medical applications where human expertise for solving the
problems was evident and where the difficulty
of the problem seemed to lie just outside the boundaries
of current AI methods.  Because of the complexity of the applications, 
a significant part of the effort has been to make the expert knowledge
of the problem explicit and to represent it appropriately in a knowledge
base.  This perspective has focussed attention on six areas for research:

.crown
	(1) Representation -- designing the symbolic
	    structures for modeling the knowledge about a problem.
            Presently this phase is carried out by the system builders;
            we intend to codify the knowledge used to make such decisions,
            both as an aid to the system builders and ultimately to enable
            the programs themselves to choose appropriate representations.

	(2) Reasoning -- the program's manipulations of the symbolic
	    representations.  This is designed by studying and then
            modeling the appropriate inference mechanisms for a problem.
                                       
	(3) Knowledge acquisition -- increasing the system's ability to 
	    acquire knowledge by direct communication with human experts.

	(4) Discovery and theory formation -- developing the
            system's abilities to find regularities in
	    knowledge and to create concepts having explanatory power.

	(5) Multiple uses of knowledge -- Using the domain
	    knowledge for additional purposes such as
            consensus building (accomodating conflicting advice from
            experts whose competence may be equal but whose "styles"
            vary), tutoring of human students by employing the
            knowledge base (both the information it contains and
            the way it's organized), and explanation (constructing a
            chain of rules, e.g., from the knowledge base, which
            satisfactorily rationalize the system's behavior to a
            human expert observing it).


.endcrown
.sec Background and Rationale

	Artificial intelligence research at the Heuristic Programming
Project has utilized medical and scientific problems to focus the
research effort. For many different applications 
over the last decade this has led to a cycle of research as follows:

.crown
1.	Define an artificial intelligence problem in a 
	scientific application.  This involves forming
	a collaboration with a scientist in a challenging
	and interesting area. Ideally, the problem should 
        present new difficulties
	which admit no obvious solution by readily available artificial
	intelligence techniques.



2.	Propose a method for representing and manipulating the domain
	knowledge.  This involves acquiring the formal and informal knowledge
	that bears on the problem.  
	This includes utilizing or developing
 	a knowledge-based system that can solve the application problem.

3.	Test the system.  In this phase the method is pushed to its
	limits.  The relationship between the design 
	and the performance of the system is used as the basis for
	future development.

.endcrown
Both success and failure of a system can lead to further research steps.
When a system fails to solve a problem, the seeds for further research 
can sometimes be found in the reasons for failure.  On the other hand, 
when a knowledge-based system is successful, the desire to use it 
effectively uncovers a number of additional needs.  Thus, many of the 
topics of artificial intelligence, -- such as the ability of a program to 
acquire knowledge, or to explain its
reasoning, or to manage updates in a knowledge base -- have grown out of
programs that were at first successful only at problem solving.
From this experience has come not only a set of approaches to building
intelligent systems, but also a broader understanding of what intelligent
systems should be like.

	The following sections discuss the background information about
each of our major research areas.  We will outline the progress that
has been made on this topic and identify the major technological tools.
Then in {YONSEC METHODSCORE} we will
discuss our perception of the outstanding research issues 
and how we plan to approach them.


.ss Representation, REPBACK:

	One of the most basic ideas of artificial intelligence is the use 
of symbols and symbolic languages to represent knowledge.  
The computer solution of problems is cast as a symbol
manipulation problem where the manipulations correspond to inferences
that convert symbols that stand for the givens of a 
problem into symbols that stand for its solution.

	Sometimes a representation must have special properties
to meet the needs of an application.
Sometimes a particluar application domain has some structure
which can be exploited to make the program's reasoning more efficient.
One simple example is the way in which a diagram speeds up theorem proving
in geometry.  An example from HPP experience is
the representation of chemical structures as in the
the DENDRAL {ref Buchanan78} program:  The following 
figure shows two ways to draw a simple molecule.
.group
.stoptext

					       
	   C - C = C - C - C - N	      C	- C - C
		       |		      |	  |   ||
		       C		      N   C   C

.starttext
.apart
Even though these figures appear different, they actually represent the
the same molecule.  By rotating the atoms in the figures, we can
convert one figure into the next.  Just as it
is important for a chemist to be able to recognize two different drawings
of the same molecule, it is important for DENDRAL to be able to recognize
different representations of a molecule.  One of the key ideas behind
the DENDRAL representations was the idea of canonical, or standard, form.
If two representations of molecules were presented in canonical form,
the molecules were identical if and only if the representations were equal.
This requirement led to a highly specialized representation 
for chemical structures in the DENDRAL program.


	One of the trends in our work has been to develop 
general purpose approaches for representing a broad range 
of knowledge in a knowledge base.  This is illustrated by the Unit Package 
that has been developed for the MOLGEN ({ref Martin77}, 
{ref Stefik78a}) project for experiment planning in molecular genetics.
In the figure below are two units from a MOLGEN knowledge base.
The first unit represents the restriction-enzyme EcoR1;  the second
unit represents a problem-solving goal for an experiment.
.group
.stoptext
--------------------------------------------------------
NAME:			ECOR1
SITE-TYPE:		STICKY-HEXA
3'-END:			OH
5'-END:			P
MODE:			NON-PRECESSIVE
MOLWT:			28500
SUBSTRATE:		DNA
RECOGNITION-SITE:
			1  2  3  4  5  6  7  8
			G  ↑  A  A  T  T     C
			----------------------
			----------------------
			C     T  T  A  A  ↑  G
			16 15 14 13 12 11 10 9
--------------------------------------------------------

NAME: 			LAB-GOAL-1
STATE:			A CULTURE with
			  ORGANISMS = A BACTERIUM with
					EXOSOMES = A VECTOR with
						     GENES = RAT-INSULIN
CONDS:			(PURE? ORGANISMS CULTURE)
--------------------------------------------------------

.starttext
.apart
The usual way of using the Unit Package is to define general knowledge
before specific knowledge.  For example, general knowledge about
enzyme, nuclease, and restriction enzymes would be entered
before the specific knowledge about a particular restriction enzyme 
like EcoR1. The Unit Package is designed to encourage the use 
of description, such as the description of a culture in the second 
unit above.  These descriptions are used for checking new information
as it is entered and for pattern-matching operations that are part of
a reasoning step. A technical report {ref Stefik78}, which describes 
the Unit Package and compares it to other work on representation,
is included in this proposal.  


	The examples above have illustrated the 
representation of object or "noun-like" knowledge.  Every program which
performs a reasoning task must also represent the inferential knowledge.
In the first version of the DENDRAL program, this kind of knowledge 
was represented as a program.  This choice of representation had the 
consequence that a chemist could not enter new knowledge into the program
directly.  Since the program structures were not understandable by the
program itself, facilities for explanation of DENDRAL's reasoning had
to be built into the program.
In the MYCIN program {ref Shortliffe76}
developed more recently, the inferential knowledge was moved out of the
program and into a knowledge base.  This knowledge was represented
in production rules {ref Davis75}.  An example of a production
rule follows:
.group
.stoptext

  If	1) The gram stain of the organism is gram negative, and
	2) the morphology of the organism is rod, and
	3) the aerobicity of the organism is anaerobic,

 then	there is suggestive evidence (.6) that the identity of
	the organism is Bacteroides.

.starttext
.apart
This allowed the system to generate its own explanations by examining
the rules it had used.
Production rules illustrate many of the themes which run through
our work on representation.  
.crown
(1) Explicitness -- Knowledge is encoded in a knowledge base and
	not just in programs.  (For example, production rules are used 
	to make inferential knowledge explicit.)  The distinction
        between knowledge
        being in a program, and in a knowledge base, is a crucial one.  
        Information encoded as a program can be run, and initially
        coded, more easily and quickly.  However, as the program grows,
        it becomes more and more difficult to add new knowledge: its
        relationships to all the other knowledge must be considered and
        programmed in explicitly.  The latter method, storing knowledge
        in a separate data structure, a knowledge base, enables the
        pieces of knowledge to be accessed and manipulated just like data.
        While their use, their running, may be somewhat slower, the system
        builder can now enter data modularly, without much conern for the
        rest of the knowledge present.  He can give the system knowledge
        for choosing and manipulating and reasoning about knowledge.
        The program can then, e.g., more easily explain its behavior
        and discuss it, than if all its knowledge were in one program.
        Finally, our recent research has touched on methods for reclaiming
        most if not all of the lost efficiency this methodology costs us:
        caching the results of frequently-repeated inference chains, e.g.

(2) Modularity -- Knowledge is encoded in independent chunks
	as far as possible.  (Production rules can be added or
	deleted from a knowledge base to change its problem-solving
	behavior.)   The grain size of the rules
        is (should be) appropriate chunks of advice to a domain expert.  This
        is useful both if the expert is to inout rules directly, and if he is
        to be convinced by the system's explanation of its behavior.

(3) Uniformity -- Knowledge is represented so that it can be manipulated
	by general purpose programs.  (Production rules and frames are
	two of the uniform methods for which we have general purpose
	processing routines.)
.endcrown

	Our perception of the outstanding research issues in representation
is discussed in {YONSS REPCORE}.  As can be seen from the examples above,
how knowledge is to be used is important in determining how it should
be represented.  With more uses for knowledge -- explanation, tutoring,
problem-solving -- come more constraints on its representation.  

.ss Reasoning, REASONBACK:

	The first step in creating a problem-solving system is
to develop and test a method for reasoning.  
In the DENDRAL {ref Buchanan78} program for inferring chemical structures
from mass spectrometry data, the reasoning framework that we tested was
called the Generate-and-test paradigm.  This consisted of (1) an exhaustive
generator of all possible solutions (chemical structures) and (2) 
a set of pruning rules which used the mass spectrometry data to eliminate
inconsistent answers.  One of the issues that became relevant in studying
this reasoning framework is the combination of possibly contradictory 
evidence.  Data in many problems is incomplete and errorful;
there is seldom a perfect match between an internal model and data.
Even if DENDRAL had a perfect model of how mass spectrometry data
corresponds to chemical structures, the data from any particular run
of a mass spectrometery is noisy so that there is both extraneous
and missing data.  In DENDRAL, an overall domain-specific matching function
was used which reflected <<a priori> probabilities of errors in the data.
Recently we have reexamined this problem {ref Stefik78a} in the context
of the GA1 program which solves an analogous problem from 
molecular genetics.

	For the MYCIN medical consultation {ref Shortliffe76} program
we used backwards-chaining as a reasoning framework.
This method develops a line of reasoning by chaining together 
MYCIN's inference rules (production rules) backwards from possible diagnoses
towards the available evidence.  This particular reasoning framework
has proved especially convenient for developing computer explanations of the
program's reasoning.
To deal with imperfect evidence and imperfect rules of inference,
a mathematical model of certainty based on numeric "certainty factors" was
developed.  This model is used to model "plausible reasoning" as the
program combines evidence.  An extension to this model has been developed
for use in the PROSPECTOR {ref Duda77} system at SRI.
In order to test the MYCIN approach in other domains, a
domain independent package, EMYCIN (for "Essential MYCIN") has been created
and is being utilized in other applications discussed elsewhere in this
proposal.

	When the MYCIN is chaining back through its inference rules and
discovers a need for information, it stops and asks for
it.  This approach is appropriate only when there is a way of supplying data
when it is needed by the reasoning program.  For some applications, such
as signal interpretation, it is better for the program to make use of
whatever it knows.  Further limitations of the backwards-chaining model are
(1) it is unidirectional, hence cannot mix top-down and bottom-up processing
and (2) it is exhaustive, hence
less efficient than approaches that reason hierarchically
by working with abstractions.  

	An alternative reasoning model which 
does not have these limitations is the "cooperating knowledge sources"
model developed for the HEARSAYII {ref Erman75} system.
This model consists of (1) the "blackboard", a global data structure
which holds the system's hypotheses, and (2) a set of 
"knowledge sources" (KSs) which contain the inference rules for the system.
Because of gaps in the theory and implementation of the individual KSs 
and noise in the data, the KSs are individually incomplete and errorful.  
A version of the "hypothesize and test" paradigm is used
which emphasizes cooperation (to help with incompleteness) and 
cross-checking (to help with errorfulness). During the hypothesize
part of the cycle, a KS can add a hypothesis to the blackboard; during
the test part of the cycle, a KS can change the rating of a
hypothesis in the blackboard.  This process terminates when a consistent 
hypothesis is generated satisfying the requirements of the overall 
solution or when knowledge is exhausted.
The power of the blackboard -- over, say, a uniform predicate calculus
or  QA4-ish assertional
net -- is its structure: it is n-dimensional, where the dimensions have
some meaning (time, level of abstractness, geographic location, etc.),
hence each rule can know what part(s) of the blackboard to monitor, and
each hypothesis is carefully placed at a meaningful spot on the blackboard.
This is a simple but powerful type of <<analogic> modelling of the domain.

	Two research programs based on this paradigm have been developed by
our group {ref Nii77}.  One is the CRYSALLIS program for interpretting
x-ray crystallography data and the other is a military signal 
interpretation program.  In these programs the HEARSAY model was extended
by (1) extending the blackboard to allow for several independent
hierachical relationships among data and hypotheses and (2)
extending the control structure to have three levels.
The first level is the hypothesis-formation level.  KSs on this level
make changes to the blackboard panels.  In the hypothesize
and test paradigm, they put hypotheses on the blackboard and test the
hypotheses of other KSs.  A rating is associated with each hypothesis
to store the overall judgment.  
Immediately above the hypothesis-formation level is the 
KS-activation level which contains two KSs.  
The KSs are called the "event-driver" and the "expectation-driver" and 
A domain-independent package, AGE-1, has been developed so
that this reasoning framework can be tested futher in other applications.

	In each of the examples above, our study of reasoning methods
always starts in the context of a problem in a scientific or 
medical domain.  We then generalize the method
and package it for further testing in other domains.  When a framework for
reasoning works well enough, research on other artificial intelligence 
topics, such as explanation or knowledge acquisition, often follows.
Our perception of outstanding (sic) research issues in
reasoning methods is discussed in {YONSS REASONCORE}.


.ss Knowledge Acquisition and Management, KAMBACK:

	One of the characteristics of the domain problems that we
study is that they require a substantial amount of domain expertise.
Goldstein addressed this point in {ref Goldstein77}.

.quotation
	Today there has been a shift in paradigm.  The fundamental
problem of understanding intelligence is not the identification of a
few powerful techniques, but rather the question of how to represent
<<large amounts of knowledge in a fashion that permits their effective
use and interaction>.  This shift is based on a decade of experience with
programs that relied on uniform search or logistic techniques that
proved to be hopelessly inefficient when faced with complex problems in
large knowledge spaces.
.endquotation

The set of domain expertise includes much formal and informal problem
solving expertise of the domain expert;  it also includes the set of perhaps
mundane facts and figures that make up the elementary knowledge of the
domain.   Before a computer system can solve problems in the domain, 
this information must be transferred from the expert to the computer.  


	Over the last decade, there has been some encouraging progress
along this dimension.  In 1968  our group started work on the DENDRAL 
program {ref Buchanan78} for inferring chemical structures from mass 
spectrometry data.  The rules of inference about mass spectrometry had
to be put in machine form, but knowledge acquisition by the
program from the chemist was beyond our technology.  Knowledge was
added by a painstaking process in which a computer scientist together
with a chemist learned each other's terminology and then wrote down the
chemical rules for the simplest kinds of chemical compounds.  Then the 
computer scientist entered the rules into the
computer and tested them and reported the results back to the chemist.
The reward for this effort over several years was a program with
expert-level performance.

	It is interesting to compare the knowledge acquisition effort
of the DENDRAL program with that of a more recent program -- PUFF.
PUFF is the product of a collaboration with the Pacific Medical Center
in San Francisco. It is a system for diagnosing pulmonary function disorder.
One hundred cases, carefully chosen to span the variety
of disease states, were used to extract 55 rules.  The knowledge base was
created and then tested with 150 additional cases.  Agreement
between PUFF and the human expert was excellent and PUFF is now in routine
use at PMC.  In contrast with DENDRAL, PUFF was
created in less than 50 hours of interaction with experts at PMC and with
less than 10 man-weeks of effort by the knowledge engineers.  
	
	
	Part of this tremendous difference in development time
is due to the fact that the domain of pulmonary function is much simpler
than mass spectrometry.  However, the main reason that the development
was so rapid is that PUFF was built with the aid of an interactive
knowledge engineering tool, EMYCIN,
the domain independent core of the MYCIN 
program.  
EMYCIN provides a framework for building consultation systems in various
domains and has dialogue facilities for acquiring a production rule 
knowledge base.  When knowledge engineers at the Heuristic Programming
Project started the PUFF project, 
they already had a reasoning framework in which
to fit the problem and an "English-like" language for expressing the
diagnostic rules.  The facilities that make EMYCIN such a powerful tool
are the direct result of the core research over the last five years on
the MYCIN program.

	Another dimension of progress closely related to knowledge
acquisition is knowledge management, that is, management of the global
structure of a knowledge base.
A knowledge base is more than a set of isolated facts: its elements
are related to one another.  In the DENDRAL program,
all of the knowledge was represented as programs and LISP data
structures. If changing one part of the program meant that
another part had to be changed as well, the programer had to know that.
As programs or knowledge bases get large, this kind of effort becomes
substantial.  A system becomes too large to maintain when no one can
remember all of the interactions and every change introduces bugs.
One of the ideas in TEIRESIAS {ref Davis76} is that a system can take
on some of the responsibility for making changes.  This had long been
advocated by many researchers [Floyd {IFIPS ref}, Green et al {1974 AI 
memo by 9 authors},  Lenat {IJCAI5 paper on Beings},
Balzer {ref?}]  in the context of automatic programming:  producing
systems which were capable of aiding in the synthesis of new systems,
and capable of managing modifications to themselves.  TEIRESIAS stored
updating instructions in the knowledge base.  When changes were made to
the knowledge base, these instructions enabled the system
to automatically make the necessary additional changes.


Research issues in knowledge acquisition and management are discussed
in {YONSS KAMCORE}.
.sss Discovery and Theory Formation, DISCOBACK:

; ************  Bruce's section

  progress:  None of these topics existed before!


Discuss Am and Meta-Dendral
[[DBL: How well does Eds IJCAI paper (sections on AM and
MetaDendral) fit the requirements here?  Bruce, you said you have
a short thing on MetaD; perhaps we should just use that with
a couple paragraphs on AM added by me.  Let me know Sunday.]]

Discuss MOLGEN application?

.sec Methods of Procedure, METHODSCORE:

	This section discusses our research plans and our perception
of the outstanding research issues.  Our approach to research 
has continued to focus on real problems in scientific and
medical applications.  These problems have historically provided the
challenges to develop new artificial intelligence methods.
Once a method has been developed and works well on a particular 
problem, we have proceded to test its limits.
We  are interested  in  exploring  the  effects of  new  ideas  about
knowledge  based programming on  a variety of  systems to effectively
test the  generality of these ideas.  Each of  the topics in the core
research area  will be  developed in  the context  of more  than  one
example program. 

	The expert systems developed at the Heuristic
Programming Project over the last decade can be used as tools for the
development of  the core  research topics.   Each of  the  biomedical
domains has particular aspects that can be utilized in this work: the
MOLGEN  program  for  molecular genetics  research  has  methods  for
representing experiment  planning, the  MYCIN program  for  infection
disease diagnosis and therapy has a well developed rule set, the PUFF
program  for pulmonary function  test intepretation has  a small 
rule set, and the VM program for interpreting physiological
measurements from  the Intensive Care Unit has  a knowledge base that
emphasizes knowledge that changes over time.


.ss Representation, REPCORE:

	In {YONSS REPBACK} we traced our work from 
specialized representations as in the DENDRAL program to representations
of more general applicability -- such as our production rule and
frame methodology.  Today's representation systems, even the "general" ones,
do not solve all of the problems that we are encountering in our research.
In most science, methods which are general are also weak.  There seems 
always to be a need to tailor aspects of a representation to particular
problems.  The following representation issues stand out in our work:

<<Time-based knowledge>

	Several problems which we are working on involve situations 
that evolve over time. In the Ventilator Management (VM) 
program {ref Fagan78}, time enters as instrument data that varies over
time.  The program must correctly track the stages of treatment on
the treatment machines.
In the RX {ref Blum78} program for reasoning from time-based clinical
data bases, statements about disease and treatment of patients need
to be adequately quantified over time.  In the MYCIN {ref Shortliffe76}
work, we want the system to be able to resume a consultation session about
a patient and appropriately update new knowledge about the patient as
treatment progresses.  In the MOLGEN project {ref Martin77}, the 
experiment planning program must plan a sequence of steps.  It must predict
how the laboratory objects will be changed over time as the manipulations
proceed.  The basic issues common to these projects are (1) time-specified
reference to objects and (2) tracking causal changes on objects over time.
While these problems do not seem conceptually difficult, they do require
extensions to the representational tools which we have available.

<<Grainsize in Complex Systems>

	Among the virtues of production rules
$$ By "production rules" we do not limit ourselves to the pure,
structurally flat syntax adopted with great success for cognitive
modelling {Newell & Simon, HPS or PSG ref}; rather, we embrace
any system of conditional statements, where the premises are some
testable condition, and the conclusion is an appropriate action to
take, or inference to draw. $
 are (1) their modularity
allows easy addition and modification of inferential knowledge and (2) 
their grainsize seems appropriate for explanation systems.  
As we move toward hierachical reasoning methods  -- the grainsize of
individual production rules seems too small for coherent explanations.
Just as the reasoning methods work with abstractions to reduce the
combinatorics, explanations of this should also be abstract.

	At present, the problem of factoring knowledge is an opaque art.
When a frame-structured representation is used, a knowledge engineer
makes decisions about what facts to group together.  This decision takes
into account indexing during problem solving and the interactions between
knowledge.  In hierachical reasoning methods 
knowledge is viewed with a varying grain size;  it starts with a
large grainsize at the beginning of problem solving and moves toward smaller
grainsize as the solution procedes.  This insight has only recently begun to
filter
down into our representation systems (hierarchies of units in AM and MOLGEN,
for example).

<<Matching representation methods to problems>

	In our current systems, a knowledge engineer must learn the
particulars about a problem and then pick or develop an appropriate
representation.  We would like to move towards a system which takes
more responsibility for choice of representation.  
One of our long term research goals is a system which can select or
modify its representations combining the knowledge of the limits and
advantages of representations with the knowledge of its own needs.

.ss Reasoning, REASONCORE:


	In {YONSS REASONBACK} we traced our research on methods of
reasoning from the Generate-and-Test paradigm (DENDRAL, GA1), to
backwards chaining (MYCIN, EMYCIN, PUFF), to the cooperative knowledge
sources model (CRYSALIS, HASP, AGE-1).  In the following we will discuss
core issues related to these reasoning models as well as some ideas for
new reasoning models.


<<Incomplete Reasoning>

	One of the themes in all of our methods of reasoning is the
treatment of inexact and incomplete knowledge.  This was characterized
in the MYCIN system by the mathematical model of certainty. 
One of the 
difficulties which we have perceived in this model is that the 
representation is inadequate for discriminating between (1) absence of
evidence and (2) evidence of absence.  This example illustrates how
the needs of the reasoning program have to influence the 
fundamental representations used in the system.


<<Reasoning with abstractions>

	The availability of the Unit Package {ref Stefik78} has broadened
our capabilities for representing abstractions.  For example, an
organism can be variously described as "a bacterium",
"E.coli K-12", "a bacterium that is grampositive", or even
"a bacterium with a vector which has the rat-insulin gene".  A reasoning
program can use the descriptions available in the Unit Package as 
abstractions in its reasoning process.  We are currently using this idea 
in the MOLGEN project for reasoning about experiment planning.

<<Orthogonal Planning>

	One of the themes in our representation work is to make
knowledge explicit for general processing.  We have carried this theme
into an experimental framework for reasoning being developed currently
in the MOLGEN project.  The idea is to make the reasoning 
operations, which are carried out by a planner, explicit in the knowledge 
base.  These operators are then
treated as (implicitly defining) an abstract "planning space".  
Our hope is that this will
provide a computer with a planning method more powerful and
flexible than previous hierarchical planning methods.  The feasibility of
this approach is currently being tested.

<<Matching reasoning methods to problems>

	One of our long term goals in developing and understanding reasoning
methods is to develop a theory for matching reasoning methods to problems.
Such a program would combine knowledge of the limitations of available
reasoning frameworks with the needs of an application to aid in the
design of a knowledge based system.

.ss Knowledge Acquisition and Management, KAMCORE:

	In {YONSS KAMBACK}, we traced our work on knowledge acquisition
from the DENDRAL program, where knowledge was acquired by a knowledge
engineer and then programmed into the system, to the PUFF example where
the EMYCIN package greatly accelerated the creation of a consultation
system for pulmonary function diagnosis.

<<Three Phases of Knowledge Acquisition>

	As a result of our recent experiences with the SACON program 
{ref Bennett78}, we
have found it useful to characterize the knowledge acquisition process as
occurring in three distinct phases.  We have done the most research on
the third phase and plan to work our way towards the first phase.

.crown

(1)#####<<Framework Identification> The first phase corresponds to making
 initial decisions about the typical advice the
consultant will give and the major reasoning steps the consultant will use.


(2)#####<<Acquisition of fundamental concepts> This is followed by an 
extended period of defining parameters and 
objects.  These objects form the fundamental vocabulary of the domain.
Using this initial domain vocabulary, 
a substantial portion of the rule base is developed. This process, lasting
approximately 2 months in the structural analysis case, 
 captures enough
domain expertise to allow the consultation system to give advice on the
large number of common cases. 

(3)#####<<Acquisition in a well-developed knowledge base.> In the final 
phase, further interactions with
the expert tend to refine and adjust the established rule base, primarily to
handle  more obscure or complicated cases.  In this phase, the system can
draw on examples from the knowledge base to guide the acquisition process.

.endcrown

Our earlier work, the TEIRESIAS program {ref Davis76}, 
which explored one possible method for handling the "final phase" will 
provide the basis for our research in knowledge acquisition.
This phase of the acquisition  task utilizes the 
large body of knowledge to set the  appropriate context for understanding
new facts.  


<<Consistency>

Developing an understanding of the automatic management of knowledge 
during and after its
its acquisition is an important aspect of our research aims.  The knowledge
base consists of the totality of concepts and relations between concepts
that have been presented to the program.  We will investigate methods for 
determining  the consistency  of the aggregate knowledge base.

The quality of the knowledge base is improved through experimentation.
Cases are run (for medical domains) by selecting a diverse set of
patients and comparing the results to the conclusions of our expert.
When the results don't match, the knowledge base must be updated
to account for those changes. Two operations are important
for this process: (1) the ability to determine the piece or pieces of
knowledge that must be changed and (2) determining that changing
the knowledge to correct the results on one patient will not produce
incorrect results when applied to another patient.
Another possibility currently being investigated is 
to in effect live with inconsistency,
just as people apparently do.  Predominantly rational behavior may
be evinced by a system which does not satisfy consistency.  The key
test is whether the elimination of any "inconsistent" rule makes the
system behave better or worse inthelong run. This is closely tied to
consensus-formation, as we discuss in the next section.


.ss Multiple Uses of a Knowledge Base

We are exploring many additional uses of the knowledge base beyond the  
performance aspects for which we acquired the knowledge. Three areas
are of interest: using the knowledge for explanation of the reasoning
steps of the program,  using the knowldge for intelligent teaching about
the domain, and consensus building among experts.

<<Explanation>

	The use of explicit inference rules in a knowledge base has
made it possible to generate an explanation of the programs' reasoning
steps.  While this has been achieved in the "backwards chaining"
reasoning model, it is more difficult in the reasoning methods which
reason hierarchically.
We will examine methods for modifying the level of explanation
based on the abstractions used by the program and a model of the user.  

<<Tutoring>

The act of explaining the knowledge
has led to the problem of using the knowledge base for tutoring 
purposes.  Our initial experiment with this in the MYCIN framework
{ref Clancey78} demonstrates the potential educational value
of this use of the knowledge base.  We will explore methods for 
relating to the user the knowledge stored in the program.

<<Consensus Building>

	We will be devising approaches for building concensus between 
experts.  Because the strength of consultation
programs will in large part lie with their ability to pool knowledge from
several sources (eg., multiple experts), it is important to recognize
apparent differences of opinion among experts and to assist, when possible,
with arriving at a concensus.  This represents another version of the
consistency checking problem: comparing the ramifications of 
multiple versions of knowledge and providing the capability to guide an
interaction in which such differences are "ironed out".  Of course there may
be times when <<both> versions of the knowledge may need to be stored and
appropriately flagged so that users can select which experts' opinion they
will follow during a consultation.  The experts may wish to select a
<<style> of reasoning (e.g., empirical versus theoretical), 
rather than a particular individual's rules.
Ultimately, the system itself may be able to choose from differing
advice in its knowledge base.

All of these areas require some augmentation to the knowledge base to
provide the causal reasoning steps upon which the knowledge is tied. 
This allows a program to explain why a particular rule was written in
addition to telling how the rule was used to make a particular conclusion.
Similiar needs have been shown in the use of a rule base for tutoring and 
for determining consensus among experts {ref Kunz78}.
Often, a rule will be put into the system cast in a much more
specific form than that which the knowledge truly applies to.  One
task to investigate is how to generalize to just the proper level.
More complex still are the subtle changes that accompany a rule as it
is generalized (changing certaintly factors, for instace).

.sec Significance


	The significance of this work is twofold:

.crown

1.######Understanding how a system can perform complex
	intelligent processes -- like diagnosis, explanation
	and discovery.  This work expands
	the boundaries of what we understand how to do with computers.

2.######Generalizing results of core research and
	understanding the limits of methods.
	This is the research that underlies the development of
	domain-independent tools of AI discussed elsewhere in this
	proposal.

.endcrown


One  of  our
ultimate  goals is to  understand the techniques  employed in building
such  programs. It  has  always  been  difficult to  determine  if  a
particular   problem-solving    method   used   in   a    particular
knowledge-based   program  is  domain-specific  or   whether  it  can
generalize  easily  to other  domains.   In  current  knowledge-based
programs,  the domain knowledge  and the manipulation of  it using AI
techniques are often so  intertwined that it is difficult to uncouple
them, to  make a program useful for another domain. This long
range  goal, then, is to  isolate AI techniques that  are general, to
determine the conditions for their  use; to build up a knowledge base
about AI techniques themselves.   
We will carry out our research with
this question in mind:  what are the criteria determining whether a
particular problem-solving  framework and representation system is  
suitable for  a  particular application? 

	


.refer (Blum78,
.|Blum, Robert L. and Wiederhold, Gio: Inferring Knowledge from#
.Clinical Data Banks Utilizing Techniques from Artificial#
.Intelligence. "Proc. 2nd Annual Symp. on Comp. Applic. in#
.Med. Care," pp. 303-307, IEEE, Washington D.C., Nov. 5-9, 1978|,,)

.refer (Bennett79,
.|Bennett J.S., Creary L.G., Engelmore R.E., Melosh R.B., A#
.Knowledge-based Consultant for structural analysis, forthcoming|,,)

.refer (Bobrow77a,
.|Bobrow D.G., Winograd T., An Overview of KRL, a Knowledge Representation#
.Language, Cognitive Science <<1>:1 (1977)|,,)

.refer (Bobrow77b,
.|Bobrow D.G., Winograd T., Experience with KRL-0, One cycle of a#
.knowledge representation language, 5IJCAI (August 1977)|,,)

.refer (Bonnet78,
.|Bonnet A., BAOBAB, A parser for a rule-based system using a#
.semantic grammar, Technical Report HPP-78-10, Heuristic Programming#
.Project, Stanford California (September 1978)|,,)

.refer (Brown77,
.|Brown, J.S., Steps toward a Theoretic Foundation for Complex,#
.Knowledge-Based CAI. BBN No. 3135.|,,)

.refer (Brown78,
.|Brown, J.S., Collins, A., and Harris, G.  Artificial#
.Intelligence and Learning Strategies.  To appear in (Harry#
.O'Neil, ed.), <<Learning Strategies>.#
.N.Y.: Academic Press, 1978.|,,)

.refer (Buchanan78,
.|Buchanan B.G., Feigenbaum E.A., DENDRAL and Meta-DENDRAL: Their#
.applications dimension,<<Artificial Intelligence> 11 (1978)|,,)

.refer (Clancey78,
.|Clancey, W.  "The Structure of a Case Method Dialogue",#
.to  appear in <<Int. Jnl. of Man Machine Studies>, Fall, 1978.|,,)

.refer (Davis77,
.|Interactive transfer of expertise: Acquisition of#
.new inference rules, 5IJCAI (August 1977)|,,)

.refer (Davis76,
.|Davis R, <<Applications of meta level knowledge to the construction,#
.maintenance, and use of large knowledge bases>, (thesis),#
.Heuristic Programming Project Memo HPP-76-7, Stanford University,#
. (July 1976)|,,)

.refer (Davis75,
.|Davis R., King J., An overview of production systems, Memo AIM-271,#
.Computer Science Department, Stanford University, (October 1975)|,,)

.refer (Duda77,
.|Duda, R. O., Hart, P., Nilsson, N. & Sutherland, G.  "Semantic#
.network representations in rule-based inference systems",# 
.in D.A. Waterman and F. Hayes-Roth (eds.), <<Pattern Directed#
.Inference Systems>, New York: Academic Press, 1978.|,,)

.refer (Engelmore77,
.|Engelmore R.S., Nii H.P., A knowledge-based system for the interpretation#
.of protein x-ray crystallographic data, Heuristic Programming Project#
.Memo HPP-77-2 (February 1977)|,,)

.refer (Erman75,
.|Erman L.D., Lesser V.R., A multi-level organization for problem#
.solving using many, diverse, cooperating sources of knowledge,#
.4IJCAI (September 1975)|,,)

.refer (Fagan78,
.|Fagan L.M., Ventilator Manager: A program to provide on-line#
.consultative advice in the intensive care unit, Heuristic Programming#
.Project Memo HPP-78-16 (Working Paper), Computer Science Department,# 
.Stanford University (September 1978)|,,)

.refer (Feigenbaum77, 
.|Feigenbaum E.A., The art of artificial intelligence: I. Themes#
.and case studies of knowledge engineering, 5IJCAI (August 1977)|,,)

.refer (Feitelson77,
.|Feitelson J., Stefik M., A case study of the reasoning in a genetics#
.experiment, Heuristic Programming Project Report 77-18 (working paper)#,
.Computer Science Department, Stanford University (April 1977)|,,)

.refer (Heiser77,
.|Heiser J.F., Brooks R.E., Ballard J.P., "Progress Report:#
.A Computerized Psychopharmacology Advisor", Proceedings of the 11th#
.<<Colegium Internationale NeuroPsychopharmacologicum>. Vienna, 1978|,,)

.refer (HR77,
.|Hayes-Roth F., Lesser V.R., Focus of attention in the HEARSAY-II#
.speech understanding system, IJCAI-77 (August 1977)|,,)

.refer (Goldstein77,
.|Goldstein I., Papert S., Artificial intellgence, language, and study#
.of knowledge, Cognitive Science <<1:1> (January 1977)|,,)

.refer (Kunz78,
.|Kunz J.C., Fallat R.J., McClung D.H., Osborn J.J., Votteri B.A.,#
.Nii H.P., Aikins J.S., Fagan L.M., Feigenbaum E.A., "A physiological#
.rule based system for interpreting pulmonary function test results",#
.Heuristic Programming Project Memo HPP-78-19, Stanford University,#
. 1978|,,)

.refer (Lenat78,
.|Lenat D.B., The ubiquity of discovery, Artificial Intelligence#
.<<9:3> (December 1977)|,,)

.refer (Lowerre76,
.|Lowerre B.T., The HARPY speech recognition system, Doctoral thesis,#
.Department of Computer Science, Carnegie-Mellon University#
.(April 1976)|,,)

.refer (Martin77,
.|Martin N., Friedland P., King J., Stefik M., Knowledge Base Management#
.for Experiment Planning, 5IJCAI (August 1977)|,,)

.refer (Minsky75,
.|Minsky M., A framework for representing knowledge,#
.in Winston P. (ed.) The psychology of computer vision,#
.New York, McGraw-Hill (1975)|,,)

.refer (Nii77,
.|Nii H.P., Feigenbaum E.A., Rule-based understanding of signals#
.in Waterman D.A., Hayes-Roth F. (eds.) Pattern-Directed#
.Inference Systems (1977)|,,)

.refer (Scott77,
.|Scott, A.C., Clancey, W., Davis, R., and Shortliffe, E.H. Explanation#
.Capabilities Of Knowledge-Based Production Systems.  <<American Journal#
.of Computational Linguistics>, Microfiche 62, 1977.  Also available as#
.Technical Report HPP-77-1, Heuristic Programming Project, Stanford CA,#
.(February 1977)|,,)

.refer (Shortliffe76,
.|Shortliffe E., Computer-based Medical Consultations: MYCIN,#
.New York, Elsevier, (1976)|,,)

.refer (Stefik78,
.|Stefik M., An examination of a frame-structured representation system,#
.Stanford Heuristic Programming Project Memo HPP-78-13 (working#
.paper) (September 1978)|,,)

.refer (Stefik78a,
.|Stefik M., Inferring DNA structures from segmentation data,#
.Artificial Intelligence <<11> (1978)|,,)